SIMD-FP solution | |
addps/addss | parallel addition / scalar addition |
subps/subss | subtraction |
maxps/maxss | max |
minps/minss | min |
andps | logical and |
orps | logical or |
xorps | logical xor |
cmpps/cmpss | comparison (type specified by immediate) |
mulps/mulss | multiplication |
divps/divss | division |
sqrtps/sqrtss | square root |
rcpps/rcpss | reciprocal |
rcqrtps/rcqrtss | square root reciprocal |
comiss | compare and store result in EFLAGS |
ucomiss | as comiss but unordered (detects NaNs etc.) |
shufps | shuffle contents of XMMn register (for example can reverse order of floats) |
unpckhps | similar to punpckhwd |
unpcklps | similar to punpcklwd |
cvtpi2ps | packed signed dwords from MMn to floats in low XMMn |
cvtps2pi | opposite to cvtpi2ps |
cvttps2pi | like cvtps2pi, but with truncation |
cvtsi2ss | dword integer register to float in low XMMn |
cvtss2si | opposite to cvtss2si |
cvttss2si | like cvtss2si, but with truncation |
movaps | move aligned data |
movups | move unaligned data |
movhps | move to/from two 64-bit operand in memory |
movlps | the same, but concerns low part of XMMn register |
movhlps | move high part to low part |
movlhps | move low part to high part |
movntps | move non-temporal data from XMMn to aligned memory area |
movmskps | move sign bits from XMMn to an integer register |
movss | move only the lowest FP number |
ldmxcsr | set SSE control register |
stmxcsr | get SSE control register |
MMX extension introduced with SSE, also available on Athlon | |
pshufw | shuffle words |
pinsrw | insert a word from integer register |
pextrw | extract a word to integer register |
psadbw | sum of absolute differences of unsigned bytes |
pminub/pminsw | min |
pmaxub/pmaxsw | max |
pavgb/pavgw | average of bytes and words |
pmulhuw | like pmulhw, but the result is unsigned (at last!!!) |
sfence | store fence - ensure that store queue is empty |
prefetcht0 | prefetches temporary data to L0 (the closest) cache |
prefetcht1 | the need for this instruction is questionable |
prefetcht2 | ditto |
prefetchnta | prefetches non-temporary data |
movntq | move non-temporal data from MMn to aligned memory area |
maskmovq | move to [edi] using byte mask in source MMn register |
pmovmskb | move most significant bits of bytes in MMn to integer register |
As a side note, some of the new instructions, particularly sfence, movntps, movntq and maskmovq, were introduced to bypass cache and directly write data to memory. Intel introduced this technique thinking of Rambus memory, which later appeared to have too low bandwidth for systems with 64MB+ of RAM.